Revert: Remove chat feature (accidental direct push) by NotYuSheng · Pull Request #206 · NotYuSheng/OmniPDF

NotYuSheng · 2025-09-30T02:57:10Z

User description

Reverting the accidental direct push to dev. The proper workflow is to create a PR from a feature branch.

This PR reverts commit 7b3c22e which was accidentally pushed directly to dev instead of through a feature branch PR.

The correct PR for removing chat will be created from the feat/remove-chat-feature branch.

PR Type

Enhancement, Tests, Documentation

Description

• Restores complete chat service functionality - Reverts accidental removal of comprehensive RAG-based chat system
• Implements RAG query processing - Adds query validation, document retrieval with ChromaDB, and LLM-based response generation
• Adds streaming chat UI - Creates interactive chat interface with document selection, preset queries, and word-by-word streaming
• Provides comprehensive testing - Includes 206+ unit tests across chat service components and models
• Configures full deployment infrastructure - Adds Helm charts, Kubernetes templates, network policies, and RBAC configuration
• Integrates with existing services - Updates PDF processor, frontend, and infrastructure components for chat service communication
• Adds security and monitoring - Includes Trivy security scanning, Prometheus metrics, and zero-trust network policies
• Updates documentation - Revises architecture diagrams, deployment guides, and service references

Diagram Walkthrough

flowchart LR
  A["Accidental Direct Push"] --> B["Revert Commit"]
  B --> C["Chat Service Core"]
  B --> D["Frontend Chat UI"]
  B --> E["Infrastructure Config"]
  
  C --> C1["RAG Processing"]
  C --> C2["Query Validation"]
  C --> C3["ChromaDB Integration"]
  
  D --> D1["Streaming Interface"]
  D --> D2["Document Selection"]
  D --> D3["Preset Queries"]
  
  E --> E1["Helm Charts"]
  E --> E2["Network Policies"]
  E --> E3["RBAC Configuration"]

File Walkthrough

Relevant files

Enhancement

13 files

rag_config.py `Add comprehensive RAG configuration and optimization system` chat_service/models/rag_config.py • Added comprehensive RAG configuration system with query validation, classification, and optimization • Implemented `EnhancedQueryValidator` with validation examples and prompt generation • Created `RAGConfig` class for model parameters and settings management • Added `QueryClassificationExamples` and `PromptTemplates` for different query types • Implemented `RAGOptimizer` with LLM-based query classification and response post-processing	+569/-0
chat.py `Implement chat router with RAG query processing` chat_service/routers/chat.py • Implemented complete chat endpoint with RAG query processing • Added query validation using LLM before performing RAG operations • Integrated ChromaDB for document retrieval with reranking capabilities • Added session-based document access control and filtering • Implemented response generation with configurable post-processing	+385/-0
5_chat_UI.py `Add chat UI with document selection and streaming` frontend/my_pages/5_chat_UI.py • Created complete chat interface with document selection and preset queries • Implemented streaming response display with configurable word-by-word effect • Added preset buttons for summarization, main topic, and key findings • Integrated session management and document filtering for chat requests	+250/-0
main.py `Integrate chat service into frontend navigation` frontend/main.py • Added `CHAT_URL` environment variable configuration • Integrated chat UI page into the main navigation structure • Updated CSS styling for chat container components	+9/-1
process.py `Add semantic embedder support to processing pipeline` pdf_processor_service/utils/process.py • Added `wait_for_semantic_embedder` function for semantic embedding processing • Updated `process_file_basic` to run semantic and sentence embedders concurrently • Imported `load_or_create_semantic_embedder_job` function	+9/-2
chat.py `Add chat router with session-based access control` pdf_processor_service/routers/chat.py • Created chat router with session validation and document access control • Implemented request proxying to chat service with session ID injection • Added validation for document access permissions per session	+38/-0
chat.py `Add chat data models for PDF processor service` pdf_processor_service/models/chat.py • Defined `ChatRequest` and `ChatResponse` Pydantic models • Added collection name constants for semantic and textual embeddings • Configured default collection to use semantic embeddings	+23/-0
chat.py `Add chat data models for chat service` chat_service/models/chat.py • Created `ChatRequest` model with session ID and document ID support • Implemented `ChatResponse` model with relevant chunks and metadata • Added field validation and default values for collection names	+25/-0
main.py `Integrate chat router into PDF processor service` pdf_processor_service/main.py • Added chat router to the main FastAPI application • Imported and included chat router in the application routing	+2/-1
10_settings_UI.py `Add chat streaming settings to UI preferences` frontend/my_pages/10_settings_UI.py • Added chat settings section with streaming toggle control • Implemented user preference for word-by-word streaming effect • Added help text explaining the streaming functionality	+9/-0
main.py `Create main FastAPI application for chat service` chat_service/main.py • Created main FastAPI application for chat service • Added Prometheus metrics instrumentation • Configured logging and included health and chat routers	+19/-0
process_pdf.py `Add chat service to backend health monitoring` frontend/components/process_pdf.py • Added Chat Service to backend status monitoring • Included `CHAT_URL` environment variable in service health checks	+1/-0
health.py `Add health check endpoint for chat service` chat_service/routers/health.py • Implemented basic health check endpoint • Added simple status response for service monitoring	+8/-0

Tests

5 files

test_chat.py `Add comprehensive unit tests for chat service` chat_service/tests/test_chat.py • Added comprehensive unit tests for chat functionality • Implemented tests for retrieval result processing and chunk reranking • Added tests for RAG query execution and error handling • Created tests for chat endpoint with various scenarios including validation failures	+300/-0
test_models.py `Add unit tests for chat data models` chat_service/tests/test_models.py • Added unit tests for `ChatRequest` and `ChatResponse` Pydantic models • Implemented validation tests for required fields and optional parameters • Added tests for model creation with various input scenarios	+79/-0
conftest.py `Add test configuration with mocked dependencies` chat_service/tests/conftest.py • Created test configuration with mocked external dependencies • Set up environment variables for unit testing • Added fixtures for mocking OpenAI and ChromaDB clients	+45/-0
__init__.py `Initialize tests package for chat service` chat_service/tests/init.py • Created tests package initialization file • Added package marker for chat service test modules	+1/-0
test-connection.yaml `Add Helm test template for chat service` helm/chat-service/templates/tests/test-connection.yaml • New Helm test template for chat service connectivity • Defines health check test using busybox wget • Enables automated testing of service availability	+15/-0

Configuration changes

39 files

deploy-helm-charts.sh `Add chat service to Helm deployment scripts` scripts/deploy-helm-charts.sh • Added chat-service to supported services list • Updated examples and documentation to reference chat-service • Included chat-service in deployment order for all services	+7/-6
load-images.sh `Update image loading examples for chat service` helm/load-images.sh • Updated examples to reference chat_service instead of pdf_extraction_service • Modified documentation and help text for chat service image loading	+5/-5
create-secrets.sh `Add chat service to secret creation script` create-secrets.sh • Added chat-service to secret creation mapping • Updated secret listing pattern to include chat-service-secrets	+4/-2
scan_with_trivy.sh `Add chat service to security scanning pipeline` scripts/scan_with_trivy.sh • Added chat_service to the list of services for security scanning • Included chat service in Trivy vulnerability assessment	+2/-1
test-all-services.sh `Add chat service to testing pipeline` scripts/test-all-services.sh • Added chat_service to the list of services with unit tests • Included chat service in comprehensive testing pipeline	+2/-1
test-single-service.sh `Add chat service to single service testing` scripts/test-single-service.sh • Added chat_service to supported services list for individual testing • Enabled single service test execution for chat service	+1/-0
_helpers.tpl `Add Helm template helpers for chat service` helm/chat-service/templates/_helpers.tpl • Created Helm template helpers for chat-service deployment • Added standard Kubernetes naming and labeling functions • Implemented service account and secret name generation helpers	+73/-0
Makefile `Update Makefile examples for chat service` Makefile • Updated examples and documentation to reference chat-service • Modified help text and usage examples for chat service deployment • Changed example service names from pdf-extraction-service to chat-service	+6/-6
values-prestaging.yaml `Add chat service to Istio gateway configuration` helm/istio-gateway/values-prestaging.yaml • Added chat-service destination rule for Istio service mesh • Configured mutual TLS mode for chat service communication	+4/-0
NOTES.txt `Add Helm chart notes template for chat service` helm/chat-service/templates/NOTES.txt • New Helm chart notes template for chat service deployment • Includes service access instructions, health checks, and testing commands • Provides monitoring and network security information	+61/-0
values.yaml `Add Helm values configuration for chat service` helm/chat-service/values.yaml • New Helm values configuration for chat service • Defines LLM configuration, resources, networking policies, and service settings • Includes Istio sidecar injection and monitoring configuration	+122/-0
values-prestaging.yaml `Add prestaging Helm values for chat service` helm/chat-service/values-prestaging.yaml • Prestaging-specific Helm values for chat service • Configures minimal resources, external vLLM integration, and network policies • Includes disk pressure tolerations and Istio configuration	+104/-0
deployment.yaml `Add Kubernetes deployment template for chat service` helm/chat-service/templates/deployment.yaml • New Kubernetes deployment template for chat service • Defines container configuration, environment variables, and resource limits • Includes LLM configuration and secret management	+92/-0
networkpolicy.yaml `Add NetworkPolicy template for chat service` helm/chat-service/templates/networkpolicy.yaml • New NetworkPolicy template for chat service zero-trust security • Defines ingress rules for PDF processor service communication • Configures egress rules for ChromaDB, MinIO, Redis, and external LLM access	+66/-0
servicemonitor.yaml `Add ServiceMonitor template for chat service monitoring` helm/chat-service/templates/servicemonitor.yaml • New ServiceMonitor template for Prometheus metrics collection • Configures scraping intervals, timeouts, and metric relabeling • Enables monitoring integration for chat service	+41/-0
Dockerfile `Add Dockerfile for chat service container` chat_service/Dockerfile • New Dockerfile for chat service container image • Configures Python 3.13 environment with FastAPI and dependencies • Includes security hardening with non-root user	+36/-0
values.yaml `Add RBAC configuration for chat service` helm/rbac/values.yaml • Added `chat-service` to service accounts list • Added chat-service RBAC configuration with ChromaDB, MinIO, and Redis access • Updated pdf-processor-service to include chat-service in canCall list	+12/-0
hpa.yaml `Add HPA template for chat service autoscaling` helm/chat-service/templates/hpa.yaml • New HorizontalPodAutoscaler template for chat service • Configures CPU and memory-based autoscaling • Enables dynamic scaling based on resource utilization	+32/-0
poddisruptionbudget.yaml `Add PodDisruptionBudget template for chat service` helm/chat-service/templates/poddisruptionbudget.yaml • New PodDisruptionBudget template for chat service high availability • Configures minimum available replicas during cluster maintenance • Ensures service availability during rolling updates	+23/-0
example.env `Add environment configuration for chat service` chat_service/example.env • New environment configuration file for chat service • Defines LLM settings, RAG optimization parameters, and Redis connection • Includes model configuration and context management settings	+23/-0
example.env `Update image captioner service environment configuration` image_captioner_service/example.env • Updated OPENAI_BASE_URL from IP address to hostname • Changed OPENAI_API_KEY from token format to lm-studio	+2/-2
docker-compose.yml `Add chat service to Docker Compose configuration` docker-compose.yml • Added `chat_service` container configuration • Configured build context and environment file • Added dependency on embedder_service	+10/-0
images.txt `Add chat service image to Helm images list` helm/images.txt • Added `ghcr.io/notyusheng/chat_service:dev-v0.0.3-5d69f89` to image list • Updated core application services section	+1/-0
Chart.yaml `Add Helm chart metadata for chat service` helm/chat-service/Chart.yaml • New Helm chart metadata for chat service • Defines chart version, description, and keywords • Configures chart as application type with RAG capabilities	+18/-0
service.yaml `Add Kubernetes Service template for chat service` helm/chat-service/templates/service.yaml • New Kubernetes Service template for chat service • Configures ClusterIP service with port 8000 • Defines service selector and port mapping	+15/-0
serviceaccount.yaml `Add ServiceAccount template for chat service` helm/chat-service/templates/serviceaccount.yaml • New ServiceAccount template for chat service • Configures service account creation and token mounting • Includes metadata labels and annotations support	+13/-0
values-prestaging.yaml `Update ChromaDB network policy for chat service access` helm/chromadb/values-prestaging.yaml • Added chat-service to NetworkPolicy ingress allowedCallers • Updated network policy to allow chat service access to ChromaDB	+4/-0
values.yaml `Update ChromaDB network policy for chat service access` helm/chromadb/values.yaml • Added chat-service to NetworkPolicy ingress allowedCallers • Updated network policy to allow chat service access to ChromaDB	+4/-0
example.env `Add chat service URL to PDF processor configuration` pdf_processor_service/example.env • Added `CHAT_SERVICE_URL=http://chat_service:8000/chat` environment variable • Updated service URL configuration to include chat service endpoint	+1/-0
values-prestaging.yaml `Update PDF processor network policy for chat service` helm/pdf-processor-service/values-prestaging.yaml • Added chat-service to NetworkPolicy egress allowedTargets • Updated network policy to allow PDF processor to communicate with chat service	+5/-0
values.yaml `Update PDF processor network policy for chat service` helm/pdf-processor-service/values.yaml • Added chat-service to NetworkPolicy egress allowedTargets • Updated network policy to allow PDF processor to communicate with chat service	+5/-0
values.yaml `Add chat service to Istio gateway destination rules` helm/istio-gateway/values.yaml • Added `chat-service.omnipdf-prestaging.svc.cluster.local` to destination rules • Configured mTLS traffic policy for chat service	+4/-0
values-prestaging.yaml `Update Redis network policy for chat service access` helm/redis/values-prestaging.yaml • Added chat-service to NetworkPolicy ingress allowedCallers • Updated network policy to allow chat service access to Redis	+4/-0
values-prestaging.yaml `Update MinIO network policy for chat service access` helm/minio/values-prestaging.yaml • Added chat-service to NetworkPolicy ingress allowedCallers • Updated network policy to allow chat service access to MinIO	+4/-0
example.env `Add chat service URL to frontend configuration` frontend/example.env • Added `CHAT_URL=http://chat_service:8000` environment variable • Updated frontend configuration to include chat service endpoint	+1/-0
values.yaml `Update Redis network policy for chat service access` helm/redis/values.yaml • Added chat-service to NetworkPolicy ingress allowedCallers • Updated network policy to allow chat service access to Redis	+4/-0
values.yaml `Update MinIO network policy for chat service access` helm/minio/values.yaml • Added chat-service to NetworkPolicy ingress allowedCallers • Updated network policy to allow chat service access to MinIO	+4/-0
example.env `Add chat service URL to nginx configuration` nginx/example.env • Added `CHAT_URL=http://chat_service:8001` environment variable • Updated nginx configuration to include chat service endpoint	+1/-0
services-to-build.txt `Add chat service to GitHub Actions build list` .github/services-to-build.txt • Added `chat_service` to the list of core application services to build • Updated CI/CD build configuration to include chat service	+1/-0

Miscellaneous

1 files

chat_service-report.txt `Add security scan report for chat service` trivy_scan_results/chat_service-report.txt • Generated security scan report for chat service Docker image • Shows clean scan results with no high or critical vulnerabilities • Includes comprehensive package and filesystem vulnerability assessment	+371/-0

Documentation

9 files

README.md `Restore chat service references in main documentation` README.md • Added chat service to the architecture description and microservices list • Updated service count from 13 to 14 individual RBAC roles • Added chat-service to network policy tables and deployment examples • Updated test count from 180+ to 206+ tests across 7 services	+21/-20
SECRET-MANAGEMENT.md `Update secret management examples for chat service` helm/SECRET-MANAGEMENT.md • Updated examples to use `chat-service` instead of `pdf-extraction-service` • Changed secret references from `pdf-extraction-service-secrets` to `chat-service-secrets` • Modified command examples and service references throughout	+25/-25
LOCAL_IMAGE_REPOSITORY.md `Update local image repository examples for chat service` helm/LOCAL_IMAGE_REPOSITORY.md • Replaced `pdf_extraction_service` references with `chat_service` in examples • Updated Docker image names and repository paths • Modified deployment and configuration examples	+21/-21
c4-diagram.puml `Add chat service to C4 architecture diagram` c4-diagram.puml • Added `chat_service` container definition with RAG capabilities • Updated frontend description to include chat interface • Added relationships between chat service and other components (ChromaDB, Redis, MinIO, vLLM)	+8/-2
README.md `Update testing documentation for chat service` scripts/README.md • Updated examples to use `chat_service` instead of `pdf_extraction_service` • Changed service count from 6 to 7 services in testing documentation • Updated test output examples and service descriptions	+18/-18
README.md `Update RBAC documentation for chat service` helm/rbac/README.md • Updated service coverage from 13/13 to 14/14 complete • Added `chat-service` to service table and data access patterns • Updated ChromaDB access patterns to include chat service	+12/-9
NETWORK-POLICY-REFERENCE.md `Update network policy reference for chat service` helm/NETWORK-POLICY-REFERENCE.md • Added `chat-service` to external AI communication table • Updated service references and communication patterns • Modified ChromaDB access patterns to include chat service	+4/-4
INSTALL.md `Update Istio gateway installation for chat service` helm/istio-gateway/INSTALL.md • Added `chat-service` to service deployment loop • Updated service communication testing examples • Modified deployment instructions to include chat service	+2/-2
Chart.yaml `Update frontend chart description for chat functionality` helm/frontend/Chart.yaml • Updated chart description to include "and chat" functionality • Modified description from PDF processing only to include chat interface	+1/-1

Dependencies

1 files

requirements.txt `Add Python requirements for chat service` chat_service/requirements.txt • New Python dependencies file for chat service • Includes FastAPI, OpenAI, ChromaDB, and monitoring libraries • Defines specific versions for all required packages	+8/-0

Additional files

4 files

__init__.py	[link]
__init__.py	[link]
values-prod.yaml	[link]
values-staging.yaml	[link]

This reverts commit 7b3c22e.

gemini-code-assist · 2025-09-30T02:58:23Z

Summary of Changes

Hello @NotYuSheng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a revert of a previous commit that accidentally removed the chat feature by being pushed directly to the dev branch. The primary purpose of this revert is to restore the chat feature, ensuring that its removal (which will happen in a future PR) follows the proper development workflow through a dedicated feature branch pull request. This action temporarily re-adds the chat functionality that was prematurely removed.

Highlights

Revert Accidental Push: This pull request reverts a previous commit that was accidentally pushed directly to the dev branch, which had removed the chat_service.
Re-introduction of Chat Service: The chat_service and all its associated files, including its Dockerfile, environment variables, FastAPI application, Pydantic models, RAG optimization logic, and unit tests, are re-introduced.
Configuration and Documentation Updates: Various configuration files and documentation, such as Makefile, README.md, Helm charts, docker-compose.yml, frontend UI, and scripts, are updated to correctly integrate the chat_service back into the system.
Restoration of Service Integration: The chat_service's presence is restored in build processes, deployment configurations, network policies, and RBAC roles across the project.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codiumai-pr-agent-free · 2025-09-30T02:58:36Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Error Handling The chat endpoint doesn't properly handle the case when doc_ids is None but the validation requires checking each doc_id. This could lead to runtime errors if doc_ids is None. OPENAI_MODEL_NAME = rag_config.model_name Error Handling The chat_with_rag function has a max_retries of 600 with 1 second delay, which could lead to extremely long waiting times (10 minutes) before timing out. Consider a more reasonable timeout strategy. async def chat_with_rag(prompt: str, doc_ids: list[str] = None, collection_name: str = "SemanticEmbeds", max_retries: int = 600, delay: int = 1): """ Performance Concern The post_process_llm_response function uses a potentially inefficient approach to remove duplicate lines by comparing each line with the previous one, which could be slow for very large responses. def post_process_llm_response(response: str) -> str: """Post-process LLM response for better formatting""" # Remove only consecutive duplicate lines to avoid breaking formatted content. lines = response.split("\n") if not lines: return "" filtered_lines = [lines[0]] for i in range(1, len(lines)): current_line_stripped = lines[i].strip() prev_line_stripped = lines[i - 1].strip() if not current_line_stripped or current_line_stripped != prev_line_stripped: filtered_lines.append(lines[i]) cleaned_response = "\n".join(filtered_lines) # Ensure response ends properly if cleaned_response and not cleaned_response.rstrip().endswith( (".", "!", "?", ":") ): cleaned_response = cleaned_response.rstrip() + "." return cleaned_response.strip()

codiumai-pr-agent-free · 2025-09-30T02:59:49Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix JSON decoding error in frontend In the frontend chat UI, parse the response from the chat service using `json.loads(response.text)` instead of `response.json()` to fix a JSON decoding error. frontend/my_pages/5_chat_UI.py [61-68] try: - logger.info(f"Decode JSON from response: {response}") + logger.info(f"Decode JSON from response: {response.text}") logger.info(f"Decode JSON from headers: {response.headers}") - data = response.json() + data = json.loads(response.text) except json.JSONDecodeError as e: logger.error(f"Failed to decode JSON from response: {response.text}: {e}") server_status.error("Received an invalid response from the server.") return "Sorry, I received an invalid response from the server." Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a double JSON encoding issue due to the proxy layer, which would cause a runtime `JSONDecodeError` and break the chat functionality.	High
	Fix broken external service connectivity Add a missing egress rule for HTTP (port 80) to the `networkpolicy.yaml` template to allow connectivity to the LLM service, which is configured with an HTTP endpoint. helm/chat-service/templates/networkpolicy.yaml [45-50] +{{- if .Values.networkPolicy.egress.allowHTTP }} +- to: [] + ports: + - protocol: TCP + port: 80 +{{- end }} {{- if .Values.networkPolicy.egress.allowHTTPS }} - to: [] ports: - protocol: TCP port: 443 {{- end }} Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a critical bug where the network policy blocks HTTP egress traffic, preventing the service from connecting to its configured LLM endpoint and rendering it non-functional.	High
	Fix inconsistent service port configuration Standardize the `chat_service` port in `nginx/example.env` from `8001` to `8000` to match other configurations and prevent connection errors. nginx/example.env [2] -CHAT_URL=http://chat_service:8001 +CHAT_URL=http://chat_service:8000 `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a critical port inconsistency for the `chat_service` across different configuration files, which would lead to a runtime failure of the service.	High
General	Use default factory for list field In the `ChatRequest` model, change the default for `doc_ids` from `None` to an empty list by using `Field(default_factory=list)`. chat_service/models/chat.py [5-13] class ChatRequest(BaseModel): """ Request model for chat API endpoints. """ message: str session_id: str - doc_ids: List[str] = None + doc_ids: List[str] = Field(default_factory=list) collection_name: str = Field(default="default_collection", description="ChromaDB collection name") Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that using `default_factory=list` is better practice for mutable defaults in Pydantic models, preventing potential `TypeError` exceptions.	Medium
	Generalize default Helm chart values Generalize the default `values.yaml` by removing hardcoded, environment-specific values and replacing them with generic defaults or placeholders to improve chart reusability. helm/chat-service/values.yaml [1-31] -# Pre-staging environment overrides - minimal for disk pressure +# Default values for chat-service. +# This is a YAML-formatted file. +# Declare variables to be passed into your templates. replicaCount: 1 -# Pre-staging LLM configuration - External vLLM service on VPN llm: - baseUrl: "http://100.108.2.57/vllm_qwen2.5/v1" - model: "Qwen2.5-14B-Coder-Instruct" # Model name as served by external vLLM + baseUrl: "" # e.g., http://my-vllm-service/v1 + model: "" # e.g., Qwen2.5-14B-Coder-Instruct -# Pre-staging resources - minimal for disk pressure resources: + limits: + cpu: 500m + memory: 512Mi + requests: + cpu: 200m + memory: 256Mi ... -# Pre-staging specific configurations - using local CRC registry image: - repository: default-route-openshift-image-registry.apps-crc.testing/omnipdf/chat_service - tag: "dev-v0.0.3-5d69f89" - pullPolicy: Always # Use local registry image + repository: omnipdf/chat_service + tag: "" # Overridden in environment-specific values files + pullPolicy: IfNotPresent ... `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that the default `values.yaml` is polluted with environment-specific values, which is against Helm best practices and harms reusability.	Medium
	Rename test for clarity and accuracy Rename the test `test_invalid_chat_response_no_metadata` to be more specific, clarifying it tests the `ChatResponse` model from the `chat_service`. chat_service/tests/test_models.py [76-79] -def test_invalid_chat_response_no_metadata(self): - """Test ChatResponse validation without metadata""" +def test_chat_service_invalid_response_no_metadata(self): + """Test ChatResponse validation without metadata for chat_service""" with pytest.raises(ValidationError): ChatResponse(response="AI response") Apply / Chat Suggestion importance[1-10]: 3 __ Why: The suggestion correctly identifies a potentially confusing test name and proposes a more descriptive one, which improves test suite clarity and maintainability.	Low
Security	Optimize Docker image with multi-stage build Optimize the Docker image by implementing a multi-stage build to remove build-time dependencies, resulting in a smaller and more secure final image. chat_service/Dockerfile [7-25] +# --- Builder Stage --- +FROM python:3.13-slim as builder + +WORKDIR /app + RUN apt-get update && apt-get install -y build-essential && rm -rf /var/lib/apt/lists/* -# Copy requirements.txt from chat_service folder in root context COPY chat_service/requirements.txt . +RUN pip install --no-cache-dir --prefix="/install" -r requirements.txt -RUN pip install --no-cache-dir -r requirements.txt +# --- Final Stage --- +FROM python:3.13-slim -# Copy utils folder from root context into /app/utils +WORKDIR /app + +ENV PYTHONPATH=/app/chat_service:/app/shared_utils + +COPY --from=builder /install /usr/local COPY shared_utils ./shared_utils - -# Copy chat_service folder from root context into /app/chat_service COPY chat_service ./chat_service -# Remove unnecessary system packages -RUN apt-get update && \ - DEBIAN_FRONTEND=noninteractive apt-get remove --purge -y linux-libc-dev && \ - apt-get autoremove -y && \ - apt-get clean && \ - rm -rf /var/lib/apt/lists/* +# Create non-root user for security and set permissions +RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /bin/bash appuser && \ + chown -R appuser:appuser /app +USER appuser + +# Expose the FastAPI port +EXPOSE 8000 + +CMD ["sh", "-c", "PYTHONPATH=/app/chat_service uvicorn chat_service.main:app --host 0.0.0.0 --port 8000"] + Apply / Chat Suggestion importance[1-10]: 6 __ Why: The suggestion proposes a multi-stage build to create a smaller and more secure Docker image, which is a valuable best practice for production environments.	Low
More

gemini-code-assist

Code Review

This pull request reverts an accidental commit and correctly re-introduces the chat feature. This is a substantial addition, including a new chat_service, frontend UI components, extensive RAG-based processing logic, and all the necessary infrastructure and configuration changes. The implementation is comprehensive and well-structured. My review focuses on several critical bugs in the tests that would prevent them from passing, a potential runtime bug in the Dockerfile configuration, and several medium-severity suggestions to improve maintainability, robustness, and consistency across the new services.

gemini-code-assist · 2025-09-30T03:01:17Z

+ENV PYTHONPATH=/app/chat_service:/app/shared_utils
+
+RUN apt-get update && apt-get install -y build-essential && rm -rf /var/lib/apt/lists/*
+
+# Copy requirements.txt from chat_service folder in root context
+COPY chat_service/requirements.txt .
+
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy utils folder from root context into /app/utils
+COPY shared_utils ./shared_utils
+
+# Copy chat_service folder from root context into /app/chat_service
+COPY chat_service ./chat_service
+
+# Remove unnecessary system packages
+RUN apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get remove --purge -y linux-libc-dev && \
+    apt-get autoremove -y && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Create non-root user for security and set permissions
+RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /bin/bash appuser && \
+    chown -R appuser:appuser /app
+
+USER appuser
+
+# Expose the FastAPI port    
+EXPOSE 8000
+
+CMD ["sh", "-c", "PYTHONPATH=/app/chat_service uvicorn chat_service.main:app --host 0.0.0.0 --port 8000"]


The CMD instruction is incorrectly overriding the PYTHONPATH environment variable, which will likely cause ImportError at runtime. Additionally, the Dockerfile can be optimized to reduce image size by combining RUN layers.

CMD Bug: The ENV at line 5 correctly sets PYTHONPATH to include both chat_service and shared_utils. However, the CMD at line 36 re-defines PYTHONPATH for its execution scope but omits /app/shared_utils. This will prevent modules from shared_utils from being imported.

Layer Optimization: The Dockerfile runs apt-get update twice (lines 7 and 21) and uses separate RUN commands to install and then remove packages. This creates unnecessary layers. These steps should be combined into a single RUN command to install build dependencies, use them to install Python packages, and then clean them up in the same layer.

I suggest refactoring the CMD to use the exec form and rely on the ENV variable, and consolidating the RUN commands for better efficiency and a smaller final image.

ENV PYTHONPATH=/app RUN apt-get update && \ apt-get install -y --no-install-recommends build-essential && \ rm -rf /var/lib/apt/lists/* # Copy requirements.txt from chat_service folder in root context COPY chat_service/requirements.txt . RUN pip install --no-cache-dir -r requirements.txt && \ apt-get purge -y --auto-remove build-essential && \ rm -rf /var/lib/apt/lists/* # Copy utils folder from root context into /app/utils COPY shared_utils ./shared_utils # Copy chat_service folder from root context into /app/chat_service COPY chat_service ./chat_service # Create non-root user for security and set permissions RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /bin/bash appuser && \ chown -R appuser:appuser /app USER appuser # Expose the FastAPI port EXPOSE 8000 CMD ["uvicorn", "chat_service.main:app", "--host", "0.0.0.0", "--port", "8000"]

gemini-code-assist · 2025-09-30T03:01:17Z

+            result = await perform_rag_query(
+                query="test query",
+                collection_name="test_collection",
+                top_k=5,
+                openai_client=AsyncMock(),
+            )
+
+            user_prompt, chunks, system_prompt, query_type = result
+            assert "couldn't find any relevant information" in user_prompt
+            assert chunks == []
+            assert query_type == "general"
+
+    @pytest.mark.asyncio
+    async def test_successful_rag_query(self, sample_chroma_results):
+        """Test successful RAG query execution"""
+        mock_chunks = [
+            {"content": "test content", "similarity_score": 0.9, "doc_id": "doc1"}
+        ]
+
+        with (
+            patch("routers.chat.get_chroma_client") as mock_chroma,
+            patch("routers.chat.prepare_retrieval_results") as mock_prepare,
+            patch("routers.chat.rag_optimizer") as mock_optimizer,
+            patch("routers.chat.prompt_templates") as mock_templates,
+        ):
+            mock_collection = AsyncMock()
+            mock_collection.query.return_value = sample_chroma_results
+            mock_chroma.return_value.get_collection.return_value = mock_collection
+            mock_prepare.return_value = mock_chunks
+
+            mock_optimizer.chunk_optimization.return_value = (
+                mock_chunks,
+                "test context",
+            )
+            # Make detect_query_type async mock
+            mock_optimizer.detect_query_type = AsyncMock(return_value="summary")
+
+            mock_templates.get_system_prompt.return_value = "system prompt"
+            mock_templates.format_user_prompt.return_value = "user prompt"
+
+            result = await perform_rag_query(
+                query="test query",
+                collection_name="test_collection",
+                top_k=5,
+                openai_client=AsyncMock(),
+            )


The calls to perform_rag_query in TestPerformRAGQuery are missing the required session_id argument. The function signature for perform_rag_query requires session_id as a positional argument, so these tests will raise a TypeError and fail to run. Please add a mock session_id to these calls.

gemini-code-assist · 2025-09-30T03:01:17Z

+class TestChatRequest:
+    def test_valid_chat_request(self):
+        """Test creation of valid ChatRequest"""
+        request = ChatRequest(
+            message="What is this document about?", collection_name="test_collection"
+        )
+
+        assert request.message == "What is this document about?"
+        assert request.collection_name == "test_collection"
+        assert request.doc_id is None
+
+    def test_chat_request_with_doc_id(self):
+        """Test ChatRequest with optional doc_id"""
+        request = ChatRequest(
+            message="What is this document about?",
+            collection_name="test_collection",
+            doc_id="doc123",
+        )
+
+        assert request.doc_id == "doc123"
+
+    def test_chat_request_default_collection(self):
+        """Test ChatRequest with default collection name"""
+        request = ChatRequest(message="Test message")
+        assert request.collection_name == "default_collection"
+
+    def test_chat_request_empty_message_allowed(self):
+        """Test ChatRequest allows empty message (Pydantic v2 behavior)"""
+        # In Pydantic v2, empty strings are valid unless explicitly constrained
+        request = ChatRequest(message="", collection_name="test")
+        assert request.message == ""
+        assert request.collection_name == "test"
+
+    def test_invalid_chat_request_no_message(self):
+        """Test ChatRequest validation without message"""
+        with pytest.raises(ValidationError):
+            ChatRequest(collection_name="test")


The tests for the ChatRequest model are incorrect and will fail due to several issues:

Missing session_id: The ChatRequest model requires a session_id, but it is not provided in any of the test instantiations, which will cause a ValidationError.

Incorrect Attribute Name: The tests refer to doc_id (singular), but the model attribute is doc_ids (plural).

Incorrect Attribute Type: The tests assign a string to doc_id, but the model expects doc_ids to be a List[str].

These tests need to be corrected to align with the ChatRequest model definition.

gemini-code-assist · 2025-09-30T03:01:17Z

+from typing import List, Dict, Any, Optional
+from openai import AsyncOpenAI, APIError
+import os
+import logging
+from enum import Enum
+
+logger = logging.getLogger(__name__)
+
+
+class EnhancedQueryValidator:
+    """Enhanced query validation system for LLM"""
+
+    def __init__(self):
+        self.validation_examples = self._get_validation_examples()
+
+    def _get_validation_examples(self) -> List[Dict[str, Any]]:
+        """Get comprehensive examples for query validation"""
+        return [
+            # PROCEED_WITH_RAG examples
+            {
+                "query": "Describe the CEO's role in the company",
+                "decision": "PROCEED_WITH_RAG",
+                "reason": "Factual extraction from specific documents",
+            },
+            {
+                "query": "What are the main findings in the quarterly report?",
+                "decision": "PROCEED_WITH_RAG",
+                "reason": "Specific document content request",
+            },
+            {
+                "query": "How does the company's revenue compare to last year?",
+                "decision": "PROCEED_WITH_RAG",
+                "reason": "Analytical question requiring document data",
+            },
+            {
+                "query": "List all the safety protocols mentioned in the manual",
+                "decision": "PROCEED_WITH_RAG",
+                "reason": "Factual extraction from specific documents",
+            },
+            {
+                "query": "Summarize the key recommendations from the research paper",
+                "decision": "PROCEED_WITH_RAG",
+                "reason": "Summarization of document content",
+            },
+            # HANDLE_WITHOUT_RAG examples
+            {
+                "query": "What is machine learning?",
+                "decision": "HANDLE_WITHOUT_RAG",
+                "reason": "General knowledge question not requiring specific documents",
+            },
+            {
+                "query": "How do I calculate compound interest?",
+                "decision": "HANDLE_WITHOUT_RAG",
+                "reason": "General procedural knowledge",
+            },
+            {
+                "query": "Hello, how are you?",
+                "decision": "HANDLE_WITHOUT_RAG",
+                "reason": "Conversational greeting",
+            },
+            {
+                "query": "Can you help me brainstorm ideas for a presentation?",
+                "decision": "HANDLE_WITHOUT_RAG",
+                "reason": "Creative assistance not requiring document search",
+            },
+            # INVALID_QUERY examples
+            {
+                "query": "banana",
+                "decision": "INVALID_QUERY",
+                "reason": "Single word without context",
+            },
+            {
+                "query": "ajsdlkfj askdjf",
+                "decision": "INVALID_QUERY",
+                "reason": "Gibberish text",
+            },
+            {
+                "query": "???",
+                "decision": "INVALID_QUERY",
+                "reason": "No meaningful content",
+            },
+            # NEEDS_CLARIFICATION examples
+            {
+                "query": "Tell me about it",
+                "decision": "NEEDS_CLARIFICATION",
+                "reason": "Ambiguous reference - what is 'it'?",
+            },
+            {
+                "query": "What about the data?",
+                "decision": "NEEDS_CLARIFICATION",
+                "reason": "Vague reference - which data and what aspect?",
+            },
+            {
+                "query": "More information please",
+                "decision": "NEEDS_CLARIFICATION",
+                "reason": "No specific topic or context provided",
+            },
+        ]
+
+    def get_enhanced_validation_prompt(
+        self, query: str, collection_info: Optional[str] = None
+    ) -> str:
+        """Generate enhanced validation prompt with context awareness"""
+
+        examples_text = "\n".join(
+            [
+                f'Query: "{ex["query"]}" -> {ex["decision"]} ({ex["reason"]})'
+                for ex in self.validation_examples
+            ]
+        )
+
+        collection_context = ""
+        if collection_info:
+            collection_context = f"\nCollection Context: {collection_info}"
+
+        return f"""You are an intelligent query router for a document-based Q&A system using RAG (Retrieval Augmented Generation).
+
+Your task is to analyze user queries and decide the best approach to handle them.
+
+DECISION OPTIONS:
+1. PROCEED_WITH_RAG - Query needs document search and retrieval
+   - Asks for specific information from documents
+   - Requires analysis of document content
+   - Seeks factual data, summaries, or insights from specific sources
+
+2. HANDLE_WITHOUT_RAG - Query can be answered with general knowledge
+   - General knowledge questions
+   - Conversational queries
+   - Procedural or how-to questions not requiring specific documents
+   - Greetings and casual conversation
+
+3. INVALID_QUERY - Query is meaningless or too vague
+   - Gibberish or random characters
+   - Single words without context
+   - Incomplete thoughts
+
+4. NEEDS_CLARIFICATION - Query is ambiguous and needs more context
+   - Vague references without clear subjects
+   - Missing important context
+   - Ambiguous pronouns or references
+
+EXAMPLES:
+{examples_text}{collection_context}
+
+INSTRUCTIONS:
+Analyze the query considering:
+1. Specificity - Is the query specific enough to be actionable?
+2. Context dependency - Does it require specific document content?
+3. Intent clarity - Is the user's intent clear?
+4. Scope - Is it asking for general knowledge vs. specific document information?
+
+Respond in this exact format:
+DECISION: [PROCEED_WITH_RAG|HANDLE_WITHOUT_RAG|INVALID_QUERY|NEEDS_CLARIFICATION]
+CONFIDENCE: [HIGH|MEDIUM|LOW]
+REASON: [Brief explanation of your decision]
+SUGGESTION: [If not PROCEED_WITH_RAG, suggest how to handle or what clarification is needed]
+
+Query: "{query}"
+"""
+
+
+class QueryType(Enum):
+    """Enumeration of supported query types"""
+
+    GENERAL = "general"
+    FACTUAL = "factual"
+    ANALYTICAL = "analytical"
+    SUMMARIZATION = "summarization"
+
+
+class RAGConfig:
+    """Configuration class for the model's RAG optimization"""
+
+    def __init__(self):
+        self.model_name = os.getenv("OPENAI_MODEL")
+
+        # LLM's Generation parameters optimized for RAG
+        self.generation_params = {
+            "temperature": float(os.getenv("MODEL_TEMPERATURE", "0.1")),
+            "max_tokens": int(os.getenv("MODEL_MAX_TOKENS", "2000")),
+            "top_p": float(os.getenv("MODEL_TOP_P", "0.8")),
+            "frequency_penalty": float(os.getenv("MODEL_FREQ_PENALTY", "0.1")),
+            "presence_penalty": float(os.getenv("MODEL_PRESENCE_PENALTY", "0.1")),
+        }
+
+        # Query parameters (lighter settings for query validation and classification)
+        self.validation_params = {
+            "temperature": 0.0,
+            "max_tokens": 500,
+            "top_p": float(os.getenv("MODEL_TOP_P", "0.8")),
+            "frequency_penalty": 0.0,
+            "presence_penalty": 0.0,
+        }
+
+        # Context management
+        self.max_context_length = int(os.getenv("MODEL_MAX_CONTEXT", "4000"))
+
+        # RAG-specific settings
+        self.min_similarity_score = float(os.getenv("MODEL_MIN_SIMILARITY", "0.1"))
+        self.enable_reranking = (
+            os.getenv("MODEL_ENABLE_RERANKING", "true").lower() == "true"
+        )
+
+        # RAG Optimization flags
+        self.enable_llm_query_classification = (
+            os.getenv("ENABLE_LLM_QUERY_CLASSIFICATION", "true").lower() == "true"
+        )
+        self.enable_response_post_processing = (
+            os.getenv("ENABLE_RESPONSE_POST_PROCESSING", "true").lower() == "true"
+        )
+
+
+class QueryClassificationExamples:
+    """Few-shot examples for query type classification"""
+
+    @staticmethod
+    def get_classification_examples() -> List[Dict[str, str]]:
+        """Returns comprehensive examples for each query type"""
+        return [
+            # Factual queries - seeking specific information, data, or facts
+            {"query": "What is the capital of France?", "type": "factual"},
+            {
+                "query": "When was the Declaration of Independence signed?",
+                "type": "factual",
+            },
+            {"query": "How many employees work at the company?", "type": "factual"},
+            {"query": "List all the ingredients in the recipe.", "type": "factual"},
+            {
+                "query": "What are the system requirements for the software?",
+                "type": "factual",
+            },
+            {"query": "Who is the CEO of the organization?", "type": "factual"},
+            {"query": "Define machine learning.", "type": "factual"},
+            {"query": "What is my shopping list?", "type": "factual"},
+            {"query": "Show me the sales figures for last quarter.", "type": "factual"},
+            # Analytical queries - requiring analysis, comparison, evaluation
+            {
+                "query": "Why did the stock price decline last month?",
+                "type": "analytical",
+            },
+            {
+                "query": "How does renewable energy compare to fossil fuels?",
+                "type": "analytical",
+            },
+            {
+                "query": "Analyze the pros and cons of remote work.",
+                "type": "analytical",
+            },
+            {
+                "query": "What are the implications of this policy change?",
+                "type": "analytical",
+            },
+            {
+                "query": "Evaluate the effectiveness of the marketing campaign.",
+                "type": "analytical",
+            },
+            {
+                "query": "How do these two products differ in performance?",
+                "type": "analytical",
+            },
+            {
+                "query": "Assess the risks associated with this investment.",
+                "type": "analytical",
+            },
+            {
+                "query": "What factors contributed to the project's success?",
+                "type": "analytical",
+            },
+            {
+                "query": "Examine the relationship between customer satisfaction and retention.",
+                "type": "analytical",
+            },
+            # Summarization queries - requesting condensed overview or summary
+            {
+                "query": "Summarize the main points of the report.",
+                "type": "summarization",
+            },
+            {
+                "query": "Give me an overview of the quarterly results.",
+                "type": "summarization",
+            },
+            {"query": "Summarize my shopping list.", "type": "summarization"},
+            {
+                "query": "Provide a summary of the meeting minutes.",
+                "type": "summarization",
+            },
+            {
+                "query": "What are the key takeaways from the research paper?",
+                "type": "summarization",
+            },
+            {
+                "query": "Condense the main findings of the survey.",
+                "type": "summarization",
+            },
+            {
+                "query": "Give me the highlights of the project status.",
+                "type": "summarization",
+            },
+            {
+                "query": "Summarize the customer feedback trends.",
+                "type": "summarization",
+            },
+            {
+                "query": "Provide an executive summary of the proposal.",
+                "type": "summarization",
+            },
+            # General queries - conversational, complex, or multi-faceted
+            {"query": "How can I improve my productivity?", "type": "general"},
+            {"query": "Tell me about artificial intelligence.", "type": "general"},
+            {"query": "What should I consider when buying a car?", "type": "general"},
+            {"query": "Help me understand this concept better.", "type": "general"},
+            {"query": "Can you explain how this process works?", "type": "general"},
+            {
+                "query": "What are some best practices for project management?",
+                "type": "general",
+            },
+            {"query": "I need advice on career development.", "type": "general"},
+            {"query": "How do I troubleshoot this technical issue?", "type": "general"},
+            {
+                "query": "What would you recommend for this situation?",
+                "type": "general",
+            },
+        ]
+
+
+class PromptTemplates:
+    """Specialized prompt templates for different types of RAG queries with LLM"""
+
+    @staticmethod
+    def get_classification_prompt(query: str) -> str:
+        """Generate prompt for LLM-based query classification"""
+        examples = QueryClassificationExamples.get_classification_examples()
+
+        # Create few-shot examples string
+        examples_text = "\n".join(
+            [
+                f'Query: "{example["query"]}" -> Type: {example["type"]}'
+                for example in examples
+            ]
+        )
+
+        return f"""You are an expert at classifying user queries into specific types for optimal response generation.
+
+Query Types:
+- factual: Requests for specific information, data, facts, definitions, or lists
+- analytical: Requests for analysis, comparison, evaluation, or reasoning about relationships and causes
+- summarization: Requests for summaries, overviews, main points, or condensed information
+- general: Conversational queries, advice-seeking, explanations, or multi-faceted questions
+
+Examples:
+{examples_text}
+
+Instructions:
+- Analyze the user's intent and primary goal
+- Consider the expected response format and depth
+- Classify based on what type of processing would best serve the user
+- Respond with only one word: factual, analytical, summarization, or general
+
+Query: "{query}"
+Type:"""
+
+    @staticmethod
+    def get_system_prompt(query_type: str) -> str:
+        """Get system prompt based on query type"""
+
+        prompts = {
+            QueryType.GENERAL.value: """You are an advanced AI assistant specialized in document analysis and question answering.
+
+Your primary responsibilities:
+- Analyze the provided document context carefully
+- Answer questions based ONLY on the information present in the context
+- Maintain accuracy and avoid hallucination
+- Provide structured, comprehensive responses
+- Cite specific sections when making claims
+
+Key principles:
+- If information is not in the context, clearly state this limitation
+- Use direct quotes from the context when appropriate
+- Organize your response logically with clear reasoning
+- Maintain professional tone while being accessible""",
+            QueryType.FACTUAL.value: """You are a precision-focused AI assistant for factual document analysis.
+
+Your task is to extract and present factual information from documents with maximum accuracy:
+- Only state facts that are explicitly mentioned in the context
+- Use exact quotes when presenting specific data, numbers, or claims
+- If asked about information not in the context, respond with "This information is not available in the provided document"
+- Structure factual responses with clear categorization
+- Distinguish between facts, opinions, and interpretations in the source
+- For lists or itemized information, present them in a clear, organized format""",
+            QueryType.ANALYTICAL.value: """You are an analytical AI assistant specialized in document interpretation and analysis.
+
+Your approach:
+- Analyze the document context for patterns, themes, and key insights
+- Synthesize information from multiple sections when relevant
+- Provide reasoned interpretations based on the available evidence
+- Highlight relationships between different parts of the document
+- Distinguish between what the document states directly vs. what can be reasonably inferred
+- Structure analytical responses with clear reasoning chains
+- Consider multiple perspectives when the document presents them""",
+            QueryType.SUMMARIZATION.value: """You are an expert at document summarization and synthesis.
+
+Your summarization strategy:
+- Identify the main themes and key points from the context
+- Organize information hierarchically (main points, supporting details)
+- Preserve important nuances and qualifications
+- Maintain the original document's tone and perspective
+- Create coherent summaries that capture essential information
+- Use bullet points or structured format when appropriate for clarity
+- Ensure summaries are concise yet comprehensive""",
+        }
+
+        return prompts.get(query_type, prompts[QueryType.GENERAL.value])
+
+    @staticmethod
+    def format_user_prompt(question: str, context: str, query_type: str) -> str:
+        """Format user prompt with context and question"""
+
+        prompt_formats = {
+            QueryType.FACTUAL.value: f"""**DOCUMENT CONTEXT:**
+{context}
+
+**FACTUAL QUERY:** {question}
+
+**INSTRUCTIONS:** Extract and present only the factual information from the document that directly answers this question. Use exact quotes where appropriate and clearly indicate if the requested information is not available.""",
+            QueryType.ANALYTICAL.value: f"""**DOCUMENT CONTEXT:**
+{context}
+
+**ANALYTICAL QUERY:** {question}
+
+**INSTRUCTIONS:** Analyze the document context to provide a comprehensive answer. Consider relationships between different parts of the document and provide reasoned interpretations based on the evidence presented.""",
+            QueryType.SUMMARIZATION.value: f"""**DOCUMENT CONTEXT:**
+{context}
+
+**SUMMARIZATION REQUEST:** {question}
+
+**INSTRUCTIONS:** Create a well-structured summary addressing the request. Organize the information logically and maintain the document's key insights and perspective.""",
+            QueryType.GENERAL.value: f"""**DOCUMENT CONTEXT:**
+{context}
+
+**QUESTION:** {question}
+
+**INSTRUCTIONS:** Based on the document context provided above, give a comprehensive and accurate answer to the question. If the context doesn't contain sufficient information, clearly explain what information is missing.""",
+        }
+
+        return prompt_formats.get(query_type, prompt_formats[QueryType.GENERAL.value])
+
+
+class RAGOptimizer:
+    """Advanced optimization techniques for RAG"""
+
+    @staticmethod
+    async def classify_query_with_llm(
+        question: str,
+        model_name: str,
+        classification_params: Dict[str, Any],
+        openai_client: AsyncOpenAI = None,
+    ) -> str:
+        """Use LLM to classify query type with few-shot examples"""
+        try:
+            classification_prompt = PromptTemplates.get_classification_prompt(question)
+
+            messages = [{"role": "user", "content": classification_prompt}]
+
+            response = await openai_client.chat.completions.create(
+                model=model_name, messages=messages, **classification_params
+            )
+
+            if response.choices and response.choices[0].message.content:
+                predicted_type = response.choices[0].message.content.strip().lower()
+
+                # Validate the predicted type
+                valid_types = [qt.value for qt in QueryType]
+                if predicted_type in valid_types:
+                    logger.info(f"LLM classified query as: {predicted_type}")
+                    return predicted_type
+                else:
+                    logger.warning(f"LLM returned invalid query type: {predicted_type}")
+                    return QueryType.GENERAL.value
+            else:
+                logger.error("LLM classification returned empty response")
+                return QueryType.GENERAL.value
+
+        except APIError as e:
+            logger.error(f"LLM query classification failed: {e}")
+            return QueryType.GENERAL.value
+
+    @staticmethod
+    async def detect_query_type(
+        question: str,
+        model_name: str = None,
+        config: RAGConfig = None,
+        openai_client: AsyncOpenAI = None,
+    ) -> str:
+        """
+        Detect query type using LLM classification
+        """
+        # Try LLM-based classification if enabled and client is available
+        if (
+            config
+            and config.enable_llm_query_classification
+            and openai_client
+            and model_name
+        ):
+            llm_classification = await RAGOptimizer.classify_query_with_llm(
+                question, model_name, config.validation_params, openai_client
+            )
+
+            return llm_classification
+
+        # If LLM classification failed, return General
+        return QueryType.GENERAL.value
+
+    @staticmethod
+    def chunk_optimization(
+        chunks: List[Dict[str, Any]], max_context_length: int = 4000
+    ) -> tuple[List[Dict[str, Any]], str]:
+        """Optimize chunk selection and formatting"""
+
+        if not chunks:
+            return [], ""
+
+        # Build context with length management
+        selected_chunks = []
+        context_parts = []
+        current_length = 0
+
+        for i, chunk in enumerate(chunks):
+            content = chunk.get("content", "")
+
+            # Format each chunk
+            chunk_header = f"--- Document Section {i + 1} (Relevance: {chunk.get('similarity_score', 0):.2f}) ---"
+            formatted_chunk = f"{chunk_header}\n{content}\n"
+
+            # Check length constraints
+            chunk_length = len(formatted_chunk)
+            if current_length + chunk_length > max_context_length:
+                break
+
+            selected_chunks.append(chunk)
+            context_parts.append(formatted_chunk)
+            current_length += chunk_length
+
+        return selected_chunks, "\n".join(context_parts)
+
+    @staticmethod
+    def post_process_llm_response(response: str) -> str:
+        """Post-process LLM response for better formatting"""
+
+        # Remove only consecutive duplicate lines to avoid breaking formatted content.
+        lines = response.split("\n")
+        if not lines:
+            return ""
+
+        filtered_lines = [lines[0]]
+        for i in range(1, len(lines)):
+            current_line_stripped = lines[i].strip()
+            prev_line_stripped = lines[i - 1].strip()
+            if not current_line_stripped or current_line_stripped != prev_line_stripped:
+                filtered_lines.append(lines[i])
+
+        cleaned_response = "\n".join(filtered_lines)
+
+        # Ensure response ends properly
+        if cleaned_response and not cleaned_response.rstrip().endswith(
+            (".", "!", "?", ":")
+        ):
+            cleaned_response = cleaned_response.rstrip() + "."
+
+        return cleaned_response.strip()


This file has grown to nearly 600 lines and contains multiple distinct classes (EnhancedQueryValidator, RAGConfig, QueryClassificationExamples, PromptTemplates, RAGOptimizer). For better maintainability and separation of concerns, consider splitting this file into smaller, more focused modules. For example, prompt generation logic could go into a prompts.py module, and configuration could be in a config.py module.

gemini-code-assist · 2025-09-30T03:01:17Z

+        # Ensure response ends properly
+        if cleaned_response and not cleaned_response.rstrip().endswith(
+            (".", "!", "?", ":")
+        ):
+            cleaned_response = cleaned_response.rstrip() + "."


The post_process_llm_response function unconditionally appends a period . to any response that doesn't end with standard punctuation. This could lead to incorrect formatting if the LLM returns a code snippet, a list, or any other formatted text that shouldn't end with a period. Consider making this behavior conditional or refining the logic to avoid altering non-prose responses.

gemini-code-assist · 2025-09-30T03:01:18Z

+            return ChatResponse(
+                response=f"""I'm sorry, but I couldn't process your query based on the documents in {chat_request.collection_name} that you want to query data from. {validation_error} Please provide a clear, specific question that I can help answer using the available documents in {chat_request.collection_name}.""",
+                relevant_chunks=[],
+                metadata=metadata_without_rag,
+            )


The error message returned when a query fails validation is quite long and repetitive, mentioning the collection name twice. This could be simplified for better readability and a cleaner user experience.

Suggested change

return ChatResponse(

response=f"""I'm sorry, but I couldn't process your query based on the documents in {chat_request.collection_name} that you want to query data from. {validation_error} Please provide a clear, specific question that I can help answer using the available documents in {chat_request.collection_name}.""",

relevant_chunks=[],

metadata=metadata_without_rag,

)

return ChatResponse(

response=f"I'm sorry, I couldn't process your query. {validation_error} Please provide a clear, specific question about the available documents.",

relevant_chunks=[],

metadata=metadata_without_rag,

)

gemini-code-assist · 2025-09-30T03:01:18Z

+            response = await client.post(url=f"{PDF_PROCESSOR_URL}/chat/",
+                                        json={"message": prompt,
+                                            "doc_ids": doc_ids,
+                                            "collection_name": "SemanticEmbeds"
+                                        })


The collection_name is hardcoded as "SemanticEmbeds" in the client.post call, ignoring the collection_name parameter passed to the chat_with_rag function. To make the function more reusable, you should use the collection_name parameter in the request payload.

Suggested change

response = await client.post(url=f"{PDF_PROCESSOR_URL}/chat/",

json={"message": prompt,

"doc_ids": doc_ids,

"collection_name": "SemanticEmbeds"

})

response = await client.post(url=f"{PDF_PROCESSOR_URL}/chat/",

json={"message": prompt,

"doc_ids": doc_ids,

"collection_name": collection_name

})

gemini-code-assist · 2025-09-30T03:01:18Z

+# Pre-staging environment overrides - minimal for disk pressure
+replicaCount: 1
+
+# Pre-staging LLM configuration - External vLLM service on VPN
+llm:
+  baseUrl: "http://100.108.2.57/vllm_qwen2.5/v1"
+  model: "Qwen2.5-14B-Coder-Instruct"  # Model name as served by external vLLM
+
+# Pre-staging resources - minimal for disk pressure
+resources:
+  limits:
+    cpu: 200m
+    memory: 256Mi
+  requests:
+    cpu: 50m
+    memory: 64Mi
+
+# Disable autoscaling for prestaging due to disk pressure
+autoscaling:
+  enabled: false
+
+# Pre-staging secrets
+secrets:
+  create: false
+  name: chat-service-secrets
+
+# Pre-staging specific configurations - using local CRC registry
+image:
+  repository: default-route-openshift-image-registry.apps-crc.testing/omnipdf/chat_service
+  tag: "dev-v0.0.3-5d69f89"
+  pullPolicy: Always  # Use local registry image
+
+# Environment variables
+env:
+  - name: LOG_LEVEL
+    value: "INFO"
+  - name: ENVIRONMENT
+    value: "prestaging"
+  - name: ENABLE_METRICS
+    value: "true"
+
+# Network policy disabled for pre-staging flexibility
+networkPolicy:
+  enabled: true
+  ingress:
+    enabled: true
+    allowedCallers:
+      # PDF Processor Service - RAG conversations
+      - podSelector:
+          matchLabels:
+            app.kubernetes.io/name: pdf-processor-service
+  egress:
+    enabled: true
+    allowDNS: true
+    allowHTTPS: true  # External vLLM API calls
+    allowHTTP: true   # External vLLM API calls
+    allowedTargets:
+      # ChromaDB - query vectors
+      - podSelector:
+          matchLabels:
+            app.kubernetes.io/name: chromadb
+        ports: [8000]
+      # MinIO - future job status + file operations
+      - podSelector:
+          matchLabels:
+            app.kubernetes.io/name: minio
+        ports: [9000]
+      # Redis - future session management
+      - podSelector:
+          matchLabels:
+            app.kubernetes.io/name: redis
+        ports: [6379]
+
+# Pod disruption budget for pre-staging (2 replicas, allow 1 down)
+podDisruptionBudget:
+  enabled: true
+  minAvailable: 1
+
+# Moderate deployment strategy for pre-staging (balanced speed/safety)
+deploymentStrategy:
+  type: RollingUpdate
+  rollingUpdate:
+    minAvailable: 1
+    maxSurge: 1
+
+# Moderate resource quota for pre-staging testing
+serviceMonitor:
+  enabled: true
+  interval: "20s"           # Between staging and prod intervals
+  scrapeTimeout: "8s"       # Slightly shorter timeout for testing
+  labels:
+    environment: prestaging # Help identify prestaging metrics
+
+# Add toleration for disk pressure to allow scheduling
+tolerations:
+  - key: "node.kubernetes.io/disk-pressure"
+    operator: "Exists"
+    effect: "NoSchedule"
+
+# Istio sidecar injection for service mesh
+podAnnotations:
+  sidecar.istio.io/inject: "true"
+  sidecar.istio.io/proxyCPU: "15m"
+  sidecar.istio.io/proxyMemory: "128Mi"
+
+# Service configuration
+service:
+  type: ClusterIP
+  port: 8000
+
+# ServiceAccount configuration
+serviceAccount:
+  create: false
+  automount: false
+
+# Ingress configuration
+ingress:
+  enabled: false
+  className: ""
+  annotations: {}
+  hosts: []
+  tls: []


The main values.yaml file contains values that are specific to a prestaging or development environment (e.g., image repository URLs pointing to a local CRC registry, specific image tags, and an IP-based URL for the LLM). Best practice for Helm charts is for the base values.yaml to contain generic, environment-agnostic defaults, with environment-specific overrides placed in separate files like values-prestaging.yaml. This makes the chart more reusable and easier to manage across different environments.

gemini-code-assist · 2025-09-30T03:01:18Z

+from pydantic import BaseModel, Field
+from typing import Optional, List, Dict, Any
+
+SEMANTIC_EMBEDDING_COLLECTION = "SemanticEmbeds"
+TEXTUAL_EMBEDDING_COLLECTION = "SentenceEmbeds"
+
+
+class ChatRequest(BaseModel):
+    """
+    Request model for chat API endpoints.
+    """
+    message: str
+    doc_ids: list[str]
+    collection_name: Optional[str] = Field(default=SEMANTIC_EMBEDDING_COLLECTION, description="ChromaDB collection name")
+
+
+class ChatResponse(BaseModel):
+    """
+    Response model for chat API
+    """
+    response: str
+    relevant_chunks: List[Dict[str, Any]] = Field(default_factory=list, description="Additional metadata about the RAG process")
+    metadata: Dict[str, Any]


The ChatRequest and ChatResponse Pydantic models defined here are duplicates of the models in chat_service/models/chat.py. This duplication can lead to inconsistencies and maintenance issues if the models diverge. Consider moving these shared models to a common library, such as the existing shared_utils directory, to ensure a single source of truth.

gemini-code-assist · 2025-09-30T03:01:18Z

+CHAT_SERVICE_URL = os.environ["CHAT_SERVICE_URL"]
+
+
+@router.post("/", status_code=201, response_model=ChatResponse)
+async def handle_chat(
+    chat_request: ChatRequest,
+    session_id: str = Depends(get_session_id),
+    _valid_session: bool = Depends(validate_session_id),
+    session_storage: SessionStorage = Depends(get_session_storage)
+):
+    """
+    Handle chat requests with session validation and document access control.
+    If doc_id is provided, validates that the user has access to that document.
+    """
+    for doc_id in chat_request.doc_ids:
+        validate_session_doc_pair(doc_id, session_id, session_storage, _valid_session)
+
+    logger.info(f"Processing chat request for session {session_id}")
+    logger.info(f"Query: {chat_request.message}")
+    logger.info(f"Document IDs: {chat_request.doc_ids}")
+    logger.info(f"Collection: {chat_request.collection_name}")
+
+    # Proxy request to chat service
+    chat_request_dict = chat_request.model_dump()
+    chat_request_dict['session_id'] = session_id
+    return await proxy_post(f"{CHAT_SERVICE_URL}/chat/", chat_request_dict) 


The URL construction for proxying requests to the chat service is a bit confusing. The CHAT_SERVICE_URL environment variable includes a path segment (/chat), and then another /chat/ is appended at the call site. This makes the configuration brittle. It would be clearer to define CHAT_SERVICE_URL as just the base URL of the service (e.g., http://chat_service:8000) and construct the full path at the call site.

Revert "feat: Remove chat feature from OmniPDF"

dabf0b2

This reverts commit 7b3c22e.

NotYuSheng merged commit 527487e into dev Sep 30, 2025
5 checks passed

codiumai-pr-agent-free Bot added the Review effort 4/5 label Sep 30, 2025

gemini-code-assist Bot reviewed Sep 30, 2025

View reviewed changes

NotYuSheng deleted the revert-chat-removal-from-dev branch October 2, 2025 15:09

Conversation

NotYuSheng commented Sep 30, 2025 • edited by codiumai-pr-agent-free Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

Uh oh!

gemini-code-assist Bot commented Sep 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

codiumai-pr-agent-free Bot commented Sep 30, 2025

PR Reviewer Guide 🔍

Uh oh!

codiumai-pr-agent-free Bot commented Sep 30, 2025

PR Code Suggestions ✨

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NotYuSheng commented Sep 30, 2025 •

edited by codiumai-pr-agent-free Bot

Loading